A concatenative speech synthesis method using context dependent phoneme sequences with variable length as search units

نویسندگان

  • Hiroyuki Segi
  • Tohru Takagi
  • Takayuki Ito
چکیده

This paper proposes a new concatenative speech synthesis method using context dependent phoneme sequences with variable length as search units. Using Japanese broadcast news programs as a speech database, we synthesize Japanese news sentences that are not included in that speech database and perform subjective evaluations of the synthesized speech. As a result, (1) 77% of speech synthesized by the proposed method was preferred to that by the conventional method, (2) speech synthesis runtime was reduced to one-tenth that of the conventional method, (3) the mean opinion score (MOS) was 3.94 in a fivepoint MOS test, and 37% of synthesized speech had the same naturalness as natural speech, and (4) speech synthesis runtime was only slightly increased despite the larger speech database. The results show the effectiveness of the proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

The use of visemes as atomic speech units in visual speech analysis and synthesis systems is well-established. Viseme labels are determined using a many-to-one phoneme-to-viseme mapping. However, due to the visual coarticulation effects, an accurate mapping from phonemes to visemes should define a many-to-many mapping scheme. In this research it was found that neither the use of standardized no...

متن کامل

Evaluation of a system for concatenative articulatory visual speech synthesis

A method for concatenative articulatory visual speech synthesis has been evaluated. The method consists in using concatenated units of articulatory parameter transitions from the middle of one phoneme to the middle of the next as input to a 3D parametric tongue model. The units were created by segmentation of the Electromagnetic articulography (EMA) measures in a database of 460 phonetically ba...

متن کامل

Unit selection in a concatenative speech synthesis system using a large speech database

One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database ca...

متن کامل

Discontinuity Removal in Concatenative Synthesized Speech

Concatenative synthesis concatenates segments of prerecorded natural human speech. It requires database of previously recorded human speech covering all the possible segments to be synthesised. Segment might be phoneme, syllable, word, phrase, or any combination. Concatenative speech synthesis is currently the most practical method for the generation of realistic speech. There mainly two types ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004